skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Fund, Fraida"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 30, 2026
  2. Data leakage remains a pervasive issue in machine learning (ML), especially when applied to science, leading to overly optimistic performance estimates and irreproducible findings. Despite its prevalence, data leakage receives limited attention in ML education, in part due to the lack of accessible, hands-on teaching resources. To address this gap, we developed interactive learning modules in which students reproduce examples from academic publications that are affected by data leakage, then repeat the evaluation without the data leakage error to see how the finding is affected. These modules were deployed by the authors in two introductory machine learning courses, enabling students to explore common forms of leakage and their impact on model reliability. Following their engagement with these materials, student feedback highlighted increased awareness of subtle pitfalls that can compromise machine learning workflows. 
    more » « less
    Free, publicly-accessible full text available July 29, 2026
  3. Thanks to increasing awareness of the importance of reproducibility in computer science research, initiatives such as artifact review and badging have been introduced to encourage reproducible research in this field. However, making "practical reproducibility" truly widespread requires more than just incentives. It demands an increase in capacity for reproducible research among computer scientists - more tools, workflows, and exemplar artifacts, and more human resources trained in best practices for reproducibility. In this paper, we describe our experiences in the first two years of the Summer of Reproducibility (SoR), a mentoring program that seeks to build global capacity by enabling students around the world to work with expert mentors while producing reproducibility artifacts, tools, and education materials. We give an overview of the program, report preliminary outcomes, and discuss plans to evolve this program. 
    more » « less
    Free, publicly-accessible full text available July 29, 2026
  4. Quantization is often cited as a technique for reducing model size and accelerating deep learning. However, past literature suggests that the effect of quantization on latency varies significantly across different settings, in some cases even increasing inference time rather than reducing it. To address this discrepancy, we conduct a series of systematic experiments on the Chameleon testbed to investigate the impact of three key variables on the effect of post-training quantization: the machine learning framework, the compute hardware, and the model itself. Our experiments demonstrate that each of these has a substantial impact on the overall inference time of a quantized model. Furthermore, we make experiment materials and artifacts publicly available so that others can validate our findings on the same hardware using Chameleon, and we share open educational resources on this topic that may be adopted in formal and informal education settings. 
    more » « less
    Free, publicly-accessible full text available May 19, 2026
  5. Thanks to advancements in wireless networks, robotics, and artificial intelligence, future manufacturing and agriculture processes may be capable of producing more output with lower costs through automation. With ultra fast 5G mmWave wireless networks, data can be transferred to and from servers within a few milliseconds for real-time control loops, while robotics and artificial intelligence can allow robots to work alongside humans in factory and agriculture environments. One important consideration for these applications is whether the “intelligence” that processes data from the environment and decides how to react should be located directly on the robotic device that interacts with the environment - a scenario called “edge computing” - or whether it should be located on more powerful centralized servers that communicate with the robotic device over a network - “cloud computing.” For applications that require a fast response time, such as a robot that is moving and reacting to an agricultural environment in real time, there are two important tradeoffs to consider. On the one hand, the processor on the edge device is likely not as powerful as the cloud server, and may take longer to generate the result. On the other hand, cloud computing requires both the input data and the response to traverse a network, which adds some delay that may cancel out the faster processing time of the cloud server. Even with ultra-fast 5G mmWave wireless links, the frequent blockages that are characteristic of this band can still add delay. To explore this issue, we run a series of experiments on the Chameleon testbed emulating both the edge and cloud scenarios under various conditions, including different types of hardware acceleration at the edge and the cloud, and different types of network configurations between the edge device and the cloud. These experiments will inform future use of these technologies and serve as a jumping off point for further research. 
    more » « less
  6. With increasing recognition of the importance of reproducibility in computer science research, a wide range of efforts to promote reproducible research have been implemented across various sub-disciplines of computer science. These include artifact review and badging processes, and dedicated reproducibility tracks at conferences. However, these initiatives primarily engage active researchers and students already involved in research in their respective areas. In this paper, we present an argument for expanding the scope of these efforts to include a much larger audience, by introducing more reproducibility content into computer science courses. We describe various ways to integrate reproducibility content into the curriculum, drawing on our own experiences, as well as published experience reports from several sub-disciplines of computer science and computational science. 
    more » « less